StreamInterface: prevent socket/reader-thread leak on handshake failure by nightjoker7 · Pull Request #918 · meshtastic/python

nightjoker7 · 2026-04-23T00:09:09Z

Problem

If StreamInterface.__init__ calls connect() or waitForConfig() and either raises (handshake timeout, config error, bad stream), the partially-constructed object propagates the exception up — but the reader thread spawned in connect() keeps running and the underlying stream/socket stays open. The caller never received a reference to the half-initialized instance, so they cannot call close() themselves.

In a reconnect loop (every Python client retrying after a flaky connection), each failed attempt leaks one socket + one thread. Compounds quickly under load.

This is consistent with the symptom shape in meshtastic/firmware#10101 (Station G2 TCP drops with MeshMonitor): the MeshMonitor client retries, retries leak fds, eventually the OS pressure shows up as connection refusal until the leaks time out.

Fix

Wrap the __init__ handshake calls in try/except. On failure, call self.close() (suppressed inside contextlib.suppress(Exception) so a secondary error doesn't shadow the original), then re-raise.

Also handle the case where close() is called from the cleanup path before connect() started the reader thread — _rxThread.join() raises RuntimeError when called on a never-started thread.

Tests added

Four unit tests in test_stream_interface.py:

close() is safe when the reader thread was never started
__init__ cleanup runs when connect() raises, original exception propagates
__init__ cleanup runs when waitForConfig() raises, original exception propagates
A failure in close() during cleanup does not shadow the original handshake failure

All passing locally, exercise the previously-uncovered patch lines.

Diff

meshtastic/stream_interface.py: +9 / -2 (the actual fix)
meshtastic/tests/test_stream_interface.py: +95 / 0 (the tests)

Copilot

Pull request overview

This PR addresses a resource-leak failure mode in StreamInterface where exceptions during initial protocol handshake (connect()/waitForConfig()) could leave a running reader thread and an open underlying transport (serial/TCP), even though the constructor raised before returning an instance to the caller.

Changes:

Wrap connect() + waitForConfig() in StreamInterface.__init__ with exception handling that attempts to close() before re-raising.
Make StreamInterface.close() tolerate RuntimeError from joining an unstarted reader thread.

Copilot · 2026-04-23T23:20:03Z

+                # Nothing to join; safe to ignore.
+                pass


If connect() fails before the reader thread is started (e.g., serial write raises before _rxThread.start()), this RuntimeError path will skip joining but also won’t close the underlying stream. For SerialInterface (which opens the port before calling StreamInterface.__init__), this means the serial port can remain open even though close() was invoked. Consider explicitly closing self.stream (and setting it to None) when the thread was never started / is not alive, instead of just pass.

Suggested change

# Nothing to join; safe to ignore.

pass

# In that case the reader thread cannot run _disconnected(),

# so close the stream here to avoid leaking the resource.

if self.stream is not None:

with contextlib.suppress(Exception):

self.stream.close()

self.stream = None

Copilot · 2026-04-23T23:20:04Z

        if self._rxThread != threading.current_thread():
-            self._rxThread.join()  # wait for it to exit
+            try:
+                self._rxThread.join()  # wait for it to exit
+            except RuntimeError:


join() without a timeout can block indefinitely if the reader thread is stuck in a blocking _readBytes() (e.g., TCPInterface uses blocking socket.recv() with no timeout). With the new __init__ exception handler calling self.close(), a handshake timeout against a silent TCP peer can hang here and never re-raise the original exception. Consider using a bounded join (and letting transport-specific close() logic interrupt the read), or otherwise ensuring blocking reads are interrupted before joining.

Copilot · 2026-04-23T23:20:04Z

+            try:
+                self.connect()
+                if not noProto:
+                    self.waitForConfig()
+            except Exception:
+                # Handshake failed (timeout, config error, bad stream). The caller
+                # never receives a reference to this half-initialized instance, so
+                # they cannot call close() themselves. If we don't clean up here,
+                # the reader thread (already started by connect()) keeps running
+                # and the underlying stream/socket leaks — the leak compounds on
+                # every retry from the caller's reconnect loop.
+                with contextlib.suppress(Exception):
+                    self.close()
+                raise


New failure-path behavior (calling close() from __init__ when connect() / waitForConfig() raises) is important for leak prevention but currently isn’t covered by unit tests. Since there are existing tests for stream_interface.py, consider adding a test that forces connect() or waitForConfig() to raise and asserts cleanup occurs (e.g., close() is invoked and doesn’t raise when the thread wasn’t started).

codecov · 2026-04-23T23:23:07Z

Codecov Report

❌ Patch coverage is 46.15385% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.52%. Comparing base (cec79a7) to head (9011baa).

Files with missing lines	Patch %	Lines
meshtastic/stream_interface.py	46.15%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #918      +/-   ##
==========================================
- Coverage   61.57%   61.52%   -0.06%     
==========================================
  Files          25       25              
  Lines        4448     4457       +9     
==========================================
+ Hits         2739     2742       +3     
- Misses       1709     1715       +6

Flag	Coverage Δ
unittests	`61.52% <46.15%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…re in __init__ If connect() or waitForConfig() raises during __init__ (handshake timeout, bad stream, config error), the reader thread started by connect() keeps running and the underlying stream/socket stays open — but the caller never receives a reference to the half-initialized instance, so they cannot call close() themselves. The leak compounds on every retry from a caller's reconnect loop. Fix: wrap connect() + waitForConfig() in try/except; call self.close() on any exception before re-raising. Also guard close() against RuntimeError from joining an unstarted reader thread (happens when close() runs from a failed __init__ before connect() could spawn it). Discovered while debugging a real-world Meshtastic firmware crash where a passive logger's retrying TCPInterface() calls against a node with 250-entry NodeDB produced a reconnect storm — every retry triggered a full config+NodeDB dump on the node, compounding heap pressure, which then exposed null-deref bugs in Router::perhapsDecode / MeshService (firmware side fixed in meshtastic/firmware#10226 and #10229). The client-side leak is independent of those firmware bugs and worth fixing on its own.

thebentern requested a review from Copilot April 23, 2026 23:15

Copilot started reviewing on behalf of thebentern April 23, 2026 23:15 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

nightjoker7 force-pushed the fix-stream-handshake-leak branch from 9011baa to bfe38ac Compare April 25, 2026 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StreamInterface: prevent socket/reader-thread leak on handshake failure#918

StreamInterface: prevent socket/reader-thread leak on handshake failure#918
nightjoker7 wants to merge 1 commit intomeshtastic:masterfrom
nightjoker7:fix-stream-handshake-leak

nightjoker7 commented Apr 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

codecov Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                # Nothing to join; safe to ignore.
-                pass
+                # In that case the reader thread cannot run _disconnected(),
+                # so close the stream here to avoid leaking the resource.
+                if self.stream is not None:
+                    with contextlib.suppress(Exception):
+                        self.stream.close()
+                    self.stream = None

Conversation

nightjoker7 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Tests added

Diff

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nightjoker7 commented Apr 23, 2026 •

edited

Loading

codecov Bot commented Apr 23, 2026 •

edited

Loading